Short Text Similarity Measure Based on Double Vector Space Model
نویسندگان
چکیده
Short text similarity measure is the basis of classification and duplicate checking of the short texts. Allowing for the insufficient consideration of the sentence semantic and structure information in similarity calculation between two short texts, we propose a novel method of short text similarity calculation based on double vector space model on the basis of traditional vector space model. Creatively transforming traditional vector space model into double vector space model. We utilize the numeral data link relations of Wikipedia to calculate semantic similarity between words, and calculate text structure similarity by dependency trees. Finally, we get the synthetic similarity by combining the semantic similar vector and structure similar vector. Our experiment results demonstrate that the proposed method has higher accuracy than other methods.
منابع مشابه
Corpus-Based methods for Short Text Similarity
This paper presents corpus-based methods to find similarity between short text (sentences, paragraphs, ...) which has many applications in the field of NLP. Previous works on this problem have been based on supervised methods or have used external resources such as WordNet, British National Corpus etc. Our methods are focused on unsupervised corpus-based methods. We present a new method, based ...
متن کاملSimilarity Calculation Method of Chinese Short Text Based on Semantic Feature Space
In order to improve the accuracy of short text similarity calculation, this paper presents the idea that use the history of short text messages to construct semantic feature space, then use the vector in semantic feature space to represent short text and do semantic extension, and finally calculate the short text similarity of corresponding vector in the semantic feature space. This method can ...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملText Document Clustering based on Phrase
Affinity propagation (AP) was recently introduced as an unsupervised learning algorithm for exemplar based clustering. In this paper novel text document clustering algorithm has been developed based on vector space model, phrases and affinity propagation clustering algorithm. Proposed algorithm can be called Phrase affinity clustering (PAC). PAC first finds the phrase by ukkonen suffix tree con...
متن کاملA new vector valued similarity measure for intuitionistic fuzzy sets based on OWA operators
Plenty of researches have been carried out, focusing on the measures of distance, similarity, and correlation between intuitionistic fuzzy sets (IFSs).However, most of them are single-valued measures and lack of potential for efficiency validation.In this paper, a new vector valued similarity measure for IFSs is proposed based on OWA operators.The vector is defined as a two-tuple consisting of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016